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In this paper we present a simulation environment enhanced with 
parallel processing which can be used on personal computers, 
based on a high-level user interface developed on Mathematica© 
which is connected to C++ code in order to make our platform 
capable of communicating with a Graphics Processing Unit. We 
introduce the reader to the behavior of our proposal by simu- 
lating a quantum adiabatic algorithm designed for solving hard 
instances of the 3-SAT problem. We show that our simulator 
is capable of significantly increasing the number of qubits that 
can be simulated using classical hardware. Finally, we present 
a review of currently available classical simulators of quantum 
systems together with some justifications, based on our willing- 
ness to further understand processing properties of Nature, for 
devoting resources to building more powerful simulators. 
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1 INTRODUCTION 



Quantum Computation, one of the most recent joint ventures between physics 
and computer science, is a promising emerging branch of science and tech- 
nology aiming at providing us with algorithms and experimental devices that 
allow us to exploit quantum effects of physical systems, in order to per- 
form simulations and calculations. Quantum Computing promises great ad- 
vances in the solution of some problems for which we know no efficient al- 
gorithms under the classical computer models and systems we currently have 
||39l . Moreover, current results and developments on both theoretical (e.g. 
Il39l|24l[30l|32l[l3l|28]) and experimental (e.g. ETl [311 l47lfT0l 146117115911611') 
arenas of quantum computing have resulted in an increased interest of several 
applied scientific communities to cross-fertilize their own fields with tech- 
niques and ideas from this discipline (e.g. 1551 [57l [56l l5l lT4l .1 

One of the main problems a computer scientist faces when learning and 
working on the development of quantum algorithms is the counterintuitive 
behavior of quantum mechanical systems. For this reason, together with the 
need to test experimental proposals before implementing them, building pow- 
erful classical computer platforms for the simulation of quantum systems is 
crucial in order to develop intuition about the behavior of quantum systems 
used for computational purposes, as well as to realize the approximate behav- 
ior of practical implementations of quantum algorithms. Particularly, quanti- 
fying resources required to process information and/or to compute a solution, 
i.e. to assess the complexity of the process, is a prioritized research area, as it 
allows us to estimate implementation costs, as well as to compare problems 
by comparing the complexity of their solutions. In summary, building simula- 
tors for quantum algorithms in classical computers would allow the scientific 
community to study and analyze the expected behavior and potential of these 
algorithms on future quantum computers. 

Developing classical computer simulations of quantum algorithms usually 
has two drawbacks: i) running such simulations of quantum algorithms is 
frequently a highly demanding task (i.e. an exponential amount of compu- 
tational resources is typically needed for exact simulations), and ii) due to 
the computer languages typically used for such classical simulations (e.g. C, 
C++, Phyton), computer scientists usually have a hard time focusing on solv- 
ing the problem in mind because of the overwhelming low-level programming 
details, i.e. high-level languages and better interfaces are needed. 

In this paper we introduce a parallel hardware and software simulation 
platform for quantum algorithms with high-level user interfaces. Our sim- 
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ulation environment is based on a high-level user interface developed on 
Mathematica© which is connected to C++ code in order to make our plat- 
form capable of communicating with a Graphics Processing Unit (GPU.) 

Our simulation environment is designed to take full advantage of multi- 
core parallel processing capabilities on the GPU in order to enhance the per- 
formance of such classical simulations thus giving scientists the option to 
work with more extensive problems in less time and without the need to ac- 
cess grid or cloud infrastructures. The high level interface, compiled as a 
Mathematica© add-on called Quantum©, allows scientists to express their 
algorithms using the Dirac notation without having to translate them into a 
matrix form. Then, we use Mathlink© to send the information from Mathe- 
matica to C++ code prepared to deploy the parallel tasks to the multiple cores 
contained in the GPU. Our CUDA interface allows users to communicate with 
kernels prepared to solve specific problems or with the linear algebra CUDA 
libraries CUBLAS and CULA. 

Our proposal could be used by quantum scientists to enhance the perfor- 
mance of quantum computing simulations using a single PC equipped with 
an NVIDIA© CUDA-compatible GPU. Moreover, our very user-friendly in- 
terfaces hide the technical, i.e. coding complexity, details of building paral- 
lel algorithms for GPUs by creating kernel. In the example we present on 
this paper, we have designed such kernels to simulate hard instances of an 
NP-complete problem, 3-satisfiability problem (3-SAT) 1211 1541 . We present 
results for benchmarks performed with a variety of instances of the 3SAT 
problem running with our simulator on the CPU and the GPU. 

The rest of this paper is divided as follows: we start by providing the 
reader with preliminary information about quantum adiabatic computation, 
the 3-SAT problem and a concise review of classical simulation of quantum 
algorithms, as these three topics are needed in order to properly describe the 
structure of our contribution. This section is followed by a reflection on the 
relationship between natural parallel processing and computer parallel pro- 
cessing, being our comments of this section a contribution towards realizing 
how massive distributed-parallel computer systems can be used to learn more 
about Nature and her processes. We then proceed to introduce the reader to 
the theoretical and practical foundations of our proposal, followed by numer- 
ical results produced by simulating a quantum adiabatic algorithm designed 
to simulate hard instances of 3-SAT. We finish this paper by delivering a con- 
clusions section. 
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2 PRELIMINARIES: QUANTUM ADIABATIC COMPUTATION, THE 
3-SAT PROBLEM, AND A CONCISE REVIEW AND JUSTIFICA- 
TION OF CLASSICAL SIMULATION OF QUANTUM ALGORITHMS 

The purpose of this section is to provide the reader with the preHminary con- 
cepts upon which we have buih our proposal. We start by deHvering the 
basics of Adiabatic Quantum Computation as that is the universal model of 
quantum computation we have employed in order to build an example to show 
the capacities of our simulation platform. Furthermore, we have also used the 
Hamiltonian proposed in [44J for solving hard instances of the 3-SAT prob- 
lem by adiabatic evolution, that is why we also introduce the definition, main 
characteristics and an example of the 3-SAT problem. We finish this section 
by providing a concise review of currently existing classical simulators of 
quantum algorithms. 

2.1 Quantum Adiabatic Computation 

The realization of a robust quantum computer must fulfill several require- 
ments ifTSl . including the development of universal models of quantum com- 
putation. Among such models we find Adiabatic Quantum Computation 
(AQC) 1.20. . 191 , a promising paradigm of quantum computing due to its ro- 
bustness ll33l[T7l , its encouraging results in the study of NP-complete prob- 
lems l|20l[T9ll25ll60l l44l, and its implementation for the study of statistical 
mechanical complex problems such as protein folding |43 1. 

The goal of AQC algorithms is to transform an initial ground state \ip{0)) 
into a final ground state |i/'(t)), which encodes the answer to the problem. 
This is achieved by evolving the corresponding physical system according 
to the Schrodinger equation with a time-dependent Hamiltonian H{t). The 
AQC algorithm relies on the quantum adiabatic theorem ll35l[T9l , which states 
that the time propagation of the quantum state will remain very close to the 
instantaneous ground state for all t € [0,r], whenever H{t) varies 

slowly enough throughout the propagation time t e [0, r] and assuming the 
ground state manifold does not cross the energy levels which lead to excited 
states of the final Hamiltonian. Here, we denote by ground state manifold the 
first m curves associated with the lowest eigenvalue of the time-dependent 
Hamiltonian for t e [0, r] , where m is the degeneracy of the final Hamiltonian 
ground state. 

Conventionally, the adiabatic evolution path is the linear sweep of s € 
[0, 1], where s — t/r: 

H{s) = (1 - s)H, + sHf. (1) 



4 



Hi is usually chosen such that its ground state is a uniform superposition 
of all possible 2" computational basis vectors. Here, we choose the spin 
states {\qi = 0), |g = 1)}, which are the eigenvectors of af with eigenvalues 
+1 and -1, respectively, as the basis vectors. Then the initial ground state 
is \ipg{Q)) = Eg,6{o,i} l9n)kn-i) ' ' ' 192) ki>- Such an initial ground 
state is usually assumed to be easy to prepare and it results in a quantum state 
with equal probability of all possible solutions. 



2.2 3-SAT 

For > 3, K-SAT is an NP-complete problem fT\\. l54l and instances of 
this problem are particularly difficult to solve when the ratio of number of 
clauses to number of variables is about 4.2 Studying the properties of 
3-SAT is an important area of research, not only because a polynomial-time 
solution to 3-SAT would imply P = NP, but also because 3-SAT may be used 
to model problems and procedures in theoretical computer science 1 1 1 as well 
as in several areas of applied computer science and engineering like artificial 
intelligence Il22ll36l . we now provide the reader with a concise introduction 
of the K-SAT problem together with an example of 3-SAT instances. 

The K-SAT Problem. Let A = {ei, 62, ... , e„, ei, 62, ... , e„} be a set of 
Boolean variables E = {e,} and their negations E = {e^}. Let us now 
construct a logical proposition P, defined as P = AJ(Vj=i %)] = Ai C'i, 
where aj E A, i.e. P is a conjunction of clauses Ci over the set A, where 
each clause consists of the disjunction of k literals. Proposition P is a K- 
SAT instance and the solution of the K-SAT problem, for instance P, consists 
of finding a set of values for those binary variables upon which P has been 
built (i.e. a bitstring), so that replacement of such binary variables for their 
corresponding binary values makes P — 1, namely, proposition P is satisfied. 

In order to provide a concise example of how a 3-SAT instance looks like, 
together with an attempt to show how difficult solving 3-SAT hard instances 
is, let E = {xi, X2,X3,X4,X5,xe}he a set of binary variables and consider a 
3-SAT instance specified by 
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P = (xi y Xj^y X5) A {x2 V V X4) A (xi V a;2 V 2%) A (^3 V a;4 V a::5)A 

(2:4 V ^5 V xg) A (afi V afa V ^5) A {xi V af2 V afs) A (a;2 V afa V ar6)A 

(xi V V afe) A (0:3 V V aTg) A (afi V ^2 V ar4) A (a;2 V V X4)A 

(2:2 V X5 V afg) A {x2 V afa V afs) A (af2 V afa V ar4) A (a;2 V a;a V a;6)A 

(afi V 352 V xa) A (afi V ar4 V x^) A (afj V af4 V afg) A (af4 V V a;e)A 

(£2 V a:a V Xg) A (a;2 V a:5 V Xg) A (a;3 V a;5 V afg) A (afi V a;a V ar6)A 

(xa V asjj V a:g) A (x4 V X5 V xg) A {xi V X2 V afa) 

As this example suggests, finding solutions of even a modest 3-SAT in- 
stance can become difficult quite easily (in this case, P has only one solution: 

xi = 1,X2 = 1, a;a = 0, a;4 = 1, X5 = 0, xg = 0.) 

2.3 A concise review (and justification) of classical computer simulation 
of quantum algorithms 

For quantum computing practitioners, classical computer simulation of quan- 
tum algorithms is crucial in order to understand and to develop intuition about 
the behavior of quantum systems used for computational purposes, as well as 
to realize the approximate behavior of practical implementations of quantum 
algorithms. 

Early works presented by Omer in pT^, Bettelli et al in fS) and Viamontes 
et al in [158 1 among others, introduced the idea of implementing quantum 
algorithms simulators using classical computer languages. Later and among 
many other interesting contributions to this field, Nyman proposed using sym- 
bolic classical computer languages for simulating quantum algorithms |40|, 
Omer proposed abstract semantic structures for modelling quantum algo- 
rithms in classical environments ll42l . and Altenkirch et al proposed a quan- 
tum programming language based on classical functional programming |3J. 
Along with these efforts, several software platforms were developed in or- 
der to simulate quantum algorithms, being |34| a comprehensive list of cur- 
rently available classical simulators of quantum algorithms. More recently, 
the availability of massively distributed computer systems like grids, clouds 
and GPUs has attracted the attention of researchers interested in harnessing 
those parallel platforms for simulating quantum algorithms, being the work 
produced by De Raedt ll48l . Caraiman lfT2l and this paper some examples of 
this emerging multidisciplinary interest. 

In addition to the arguments provided at the beginning of this section, an- 
other attractive application of research results on classical simulation of quan- 
tum systems is the realization of what exactly is quantum about quantum al- 
gorithms, for the following reasons: 
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1 . We need to understand exactly which properties and operations of quan- 
tum systems cannot be efficiently simulated by classical systems (see 
ll38l and ifTTl for most interesting results related to this topic). 

2. We also need to realize whether and how exclusively quantum mechan- 
ical properties and operations can be employed for algorithm speed-up. 

An example of the importance of realizing whether truly quantum proper- 
ties can be used for algorithm speed-up was provided in the field of quantum 
walks a few years ago. Since the publication of ll37l it had been believed 
that the enhanced variance of position distribution in quantum walks was re- 
sponsible (partially at least) for quadratic speed-up of quantum walk-based 
algorithms. However, arguments in favor of the plausibility of using classical 
physics for building experiments which replicate some interference and statis- 
tical properties of quantum walks are given in |^6l, f50l, (45], and (ST), where 
it was shown that it is possible to develop implementations of a quantum walk 
on a line purely described by classical physics (wave interference of electro- 
magnetic fields) and still be able to reproduce the variance enhancement that 
characterizes a discrete quantum walk. For example, the implementation pro- 
posed in Ii45i utilizes the frequency of a light field as walker and the spatial 
path or the polarization state of the same light field as the coin. 

3 REFLECTIONS ON NATURAL PARALLEL PROCESSING AND 
COMPUTER PARALLEL PROCESSING 

Nature has developed very quick shortcut procedures in order to reach stable 
configurations (as in the case of protein folding PI) as well as to exhaustively 
compute all possible configurations of a physical system (as in the case of 
quantum superposition and quantum parallelism [9]). If we think of these 
phenomena from a computer science perspective, it is indeed our opinion 
that it is reasonable to hypothesize that Nature uses parallel procedures in 
order to quickly arrive at stable configurations, as well as to fully run natural 
phenomena for which an exponential or factorial amount of computer power 
would be needed for exhaustively computing all possible values or solutions. 
The question, if such a conjecture is to be further explored, is to discover how 
Nature executes such parallel procedures. 

A long-term goal of the authors is to find out how massively parallel com- 
puter platforms can be used for simulating parallel processes performed by 
Nature. This goal includes not only running independent computations (as it 
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would be the case the computations of all possible values of certain boolean 
functions using quantum parallelism) but also to find out how to simulate cor- 
relations and emergent properties of massively connected networks (e.g. |i6J.) 
This paper is a first step towards such a goal. 

4 SIMULATING ADIABATIC QUANTUM ALGORITHMS ON GPUS 
4.1 GPGPUs 

When the first Graphic Processor Units (GPUs) were designed, they were 
intended to support the complex mathematical operations and rendering re- 
quired to create visually intensive simulations (see, for example, lf53l 11611 .') 
As they evolved, these GPUs attracted the attention of scientists from other 
disciplines looking for alternative methods to access high performance com- 
putation. This gave birth to the general purpose graphic processing units or 
GPGPUs. The Graphic Card manufacturer NVIDIA© soon became one of 
the most important companies creating single-chip multi-core GPUs, and they 
combined it with a software programming interface called CUDA which al- 
lowed programmers to easily take advantage of parallel processing in their 
personal computers (|29, IS].) Today, NVIDIA© allows millions of users 
to create parallel versions of their algorithms and simulations without requir- 
ing access to grids or clouds. It offers GPUs with up to 1024 independent 
cores running at 1.5GHz and their hardware can be controlled from programs 
in languages like C, C++, Java, Python and many others (see, for example. 

In order to take full advantage of the multi-core parallelism, a program is 
first analized and segmented in sequential and parallel functions. Sequential 
functions are preferably run by the CPU as there is no significant process- 
ing gain in running sequential algorithms in multiple cores. Parallel code, 
the one without data dependencies, is consolidated in one or several kernels. 
Then, the programmer identifies the number of parallel execution threads re- 
quired to complete the requested operation. In the case of NVIDIA© GPUs, 
these threads are divided into virtual blocks and grids. Threads inside a block 
can communicate with each other using shared memory but if two blocks 
of threads need to communicate, they must do so using the global memory, 
which is slower than the shared memory. There are physical limitations to 
the number of threads a block can contain and to the number of blocks a 
grid can contain, bounded by the number of real cores present in the GPU. 
Programmers can overcome some of these limitations via developing further 
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computer code but the number of actual threads running at the same time can 
never surpass the real number of cores. 

As in any other distributed processing infrastructure, there is a need to 
send information from a central processing element to the distributed pro- 
cessing units. In the case of NVIDIA© GPUs, CUDA offers different means 
to send information from the CPU to the GPU including memory copy, paged 
memory and asynchronous communications. This process can slow down the 
computation and even result in worse performance than a serial approach if 
the design does not carefully take into account the way data is manipulated in- 
side the GPU Ii29iil5il . hence the need to employ highly specialized computer 
programmers for this purpose. 

4.2 Mathematica and GPGPU 

The CUDA programming interface allows code from other programming lan- 
guages to interact with the NVIDIA© hardware. This interaction allows the 
creation of higher level applications which hide the inner complexity of dis- 
tributing parallel tasks to several cores to the final user On the other hand, in 
the realm of quantum computing, we have been working with Mathematica© 
to create a high-level and high-performance simulation environment. This 
high-level application has been compiled into an add-on called Quantum© 
( ||23I ). which allows end users to simulate calculations using a Dirac notation 
interface. Mathematica© provides ways to communicate the native user in- 
terface with code outside the package in a variety of programming languages 
such as C, C++ and Java |49|. The combination of these two worlds led 
to the idea of building a bridge between the Quantum© add-on running on 
Mathematica© and C-n- code that could distribute processing to the GPUs. 

In our platform we have given Quantum© the ability to interact with 
tailored functions in CUDA to attack specific problems or to communicate 
with the specialized linear algebra libraries CUBLAS and CULA. This has 
enabled us to enhance monolithic simulations or atomic operations within 
a complex simulation. So far we have worked with Mathematica© 7 and 
Mathlink©: we have created the data structures using the high-end inter- 
face of Quantum©, have then sent this information using Mathlink© to a 
C++ code that deploys blocks of threads in the GPU to satisfy correspond- 
ing requests. A result is built using information from every thread and then 
sent back to Quantum© using MathHnk©. This process gives Quantum© 
users the ability to use their desktop or laptop computers as high performance 
computation infrastructure to simulate quantum algorithms and quantum pro- 
cesses in a very user friendly manner Mathematica© 8 now integrates a 
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FIGURE 1 

Information flow among Quantum, Mathematica, Mathlink, C/C++, CUDA, and 
CPUs. 



native way to interact with CUDA which allows the deployment of kernels 
directly and without using Mathlink©, thus we expect our kernels to run 
faster and in a more integrated way than in current Mathematica© 7. We 
present in Fig. ([T]) a visualization of data flow among Quantum, Mathemat- 
ica, Mathhnk, C/C++ code, CUDA, and GPU hardware. 

In order to stress differences among parallel and serial simulation of quan- 
tum processes, we show in Fig. Q these three different computational ap- 
proaches to solve an instance of a problem. On the left hand side segment of 
Fig. (|2]i of we see an algorithm which uses quantum processing units to solve 
the problem at hand: taking advantage of quantum parallelism, we use only 
one processing unit for all the solution space, so the computational load per 
processing unit is low. In the central portion of Fig. (|2]i we see a Multi-core 
Multi-thread GPU based approach to solve the problem. Here, the number 
of cores is limited, but the possible solutions are distributed among the avail- 
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FIGURE 2 

Quantum, multi-core, and serial computational approaches. 



able cores so the computational load is higher than in a quantum approach but 
lower than a serial approach. In the right segment of Fig. (j2]l we see a clas- 
sical serial implementation to solve the problem, in which the computational 
load per processing unit increases because all the possible solutions must be 
tested in only one core. 

One interesting feature of this project is the ease with which new ker- 
nels can be written. It is very common that one of the biggest obstacles for 
scientists using parallel infrastructure is to be able to transform their serial al- 
gorithms into corresponding parallel versions. Sometimes this process is not 
even suitable for the application and results in worse performance than the 
stand-alone approach. Nevertheless, taking quantum algorithms and deploy- 
ing them into parallel structures is easier because they are already engineered 
to exploit the quantum parallelism. 

4.3 Results 

We have tested our system by building a software platform for simulating an 
adiabatic quantum algorithm for solving hard instances of the 3SAT problem 
ll44l . The adiabatic quantum algorithm we have simulated consists of the de- 
sign of a time-dependent Hamiltonian which can be separated into three parts. 
The first part, the initial Hamiltonian, encodes the ground state of the system 
that should be easy to prepare. The second part, the driving Hamiltonian, is 
in charge of taking the system from an initial state to the final state. The third 
part, the final Hamiltonian, is created from an energy function which will give 
every possible state an energy level proportional to the number of unsatisfied 
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clauses. The energy function depends on the instance and is constituted by 
a sum of smaller energy functions, one for each clause. The ground state of 
this final Hamiltonian encodes the solution to the problem. 

With the purpose of exhibiting the advantages of using GPUs instead of 
CPUs for quantum algorithm simulation, we have firstly run a simulation of 
the above-mentioned algorithm over a CPU. Then, we built a specific CUDA 
kernel to enhance its performance. The idea of our parallel implementation 
consists of simulating quantum parallelism with multi-core parallelism. We 
took the energy function over which the above-mentioned Hamiltonian is 
built and turned it into a kernel. This way, we can create multiple processing 
threads and each will evaluate one combination of variables and assign it an 
energy level using the function. 

We tested our simulation environment with instances of the 3SAT problem 
using a ratio from number of clauses to number of variables about 4.2. The 
tests were run using a PC with Intel Core 2 Duo processor @ 2.66GHz, 8GB 
of RAM memory running with Windows Vista and an NVIDIA Geforce GTX 
8800 video card of 512MB of video memory and 128 parallel cores. The 
simulation environment currently runs on Mathematica 7. 

In Fig. ^ we present the results obtained in processing time for differ- 
ent instances of the 3SAT problem running on the CPU and the GPU. In Fig. 
Q we show a comparison between the results obtained with both devices. 
As it can be seen in Figs. ( |3|4[ ), the processing time used by the CPU in- 
creases exponentially while the time in the GPU increases on an slower ratio 
and scales according to the GPU occupancy factor, i.e. the number of actual 
parallel cores required to fulfill the processing needs of each instance. In Fig. 
(jSj) we can see the processing time used to simulate instances of the 3SAT 
for several qubits. These results were limited to the instances that could be 
simulated within a 2.5 days processing time frame. 

Based on our results, we observe that the number of qubits simulated using 
our GPU tools easily double the ones simulated on a CPU using our setup. 
These results are mainly due to the combination of two characteristics in our 
simulation: firstly, we aid the simulation tasks with the power of multi-core 
GPU processing with kernels designed to take advantage of the special mem- 
ory, thread management and synchronization capabilities of NVIDIA cards. 
Secondly, we simulate quantum parallelism directly with classical multi-core 
parallelism, which allows us to exploit the GPU occupancy factor to the max- 
imum on every run. 

Even when the number of possible variable combinations surpasses the 
available number of parallel threads in the GPU, we can still get an excellent 
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CPU and GPU execution times for different instances of the 3 SAT problem 
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performance enhancement. We use shared memory inside each processing 
block to enhance the access time to data within the kernel. We also write the 
result to global data concurrently, in separated memory blocks, to enhance 
the data throughput. 

5 CONCLUSIONS 

In this paper we have presented a GPU-based symbolic and parallel platform 
for clasically simulating quantum algorithms. Our simulation environment is 
based on a high-level user interface developed on Mathematica© which is 
connected to C++ code in order to make our platform capable of communi- 
cating with a Graphics Processing Unit. The main contribution of this work 
is the creation of a simulation environment enhanced with parallel processing 
which can be used on personal computers and creates a direct comparison 
between quantum parallelism and classic multi-core parallelism. 

In order to properly introduce the behavior of our proposal we have simu- 
lated a quantum adiabatic algorithm designed for solving hard instances of the 
3-SAT problem. Based on our results, we observe that the number of qubits 
that can be simulated using our GPU tools doubles the ones simulated on a 
CPU efficiently using our setup. These results are possible due to the combi- 
nation of two characteristics in our simulation: firstly, we aid the simulation 
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tasks with the power of multi-core GPU processing with kernels designed 
to take advantage of the special memory, thread management and synchro- 
nization capabilities of NVIDIA cards, and secondly, we simulate quantum 
parallelism directly with classical multi-core parallelism, which allows us to 
maximally exploit the GPU occupancy factor on every run. Additionally, we 
have presented a review of currently available classical simulators of quantum 
systems together with some justifications, based on our willingness to further 
understand processing properties of Nature, for devoting resources and efforts 
to building more powerful simulators. 
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